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Abstract. In the spirit of the emergent field of econophysics, a goodness-of-fit test for the Power-Law 
distribution, based on the Empirical Distribution Function (EDF) is presented, and related problems are 
discussed. An analysis of the tail behaviour of the daily logarithmic variation of the Mexican Stock Market 
Index (IPC), showed distributional properties which are consistent with previous studies. 
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■ 1 Introduction 

The Power-law distribution, also called Pareto distribu- 
. tion describes phenomena presented in fields as Social Sci- 
' ences, i.e. Economics and Finance (individuals income dis- 
' tribution, stock market price variation distribution, etc) or 
1 Physics (phase transitions, nonlinear dynamics, and dis- 
' ordered systems) . Although with a tradition of more than 
a hundred years PEIISli interest of physics community in 
the complex behavior of Financial Markets has strongly 
[ increased in the last years boosted by the availability of 

■ huge worldwide economical data electronically recorded, 
' giving rise to a complementary or non orthodox under the 
, point of view of traditional Economic Theory way to at- 
' tack problems based in the empirical data analysis rather 
I than in the traditional economical analysis. The coUec- 
. tion of methods and techniques originally developed to 

study problems arising from physics and that currently 
, are being used to understand financial complex systems, 
• is called Econophysics and appears as an emergent branch 
I of Physics by itself |3HH|iniEIIHl ■ 

. About 100 years ago, the Power-Law distribution was pro- 
' posed by Pareto to describe the distribution of income of 
, individuals. More recently, based in Mandelbrot's pioneer 
' work [H] and later of Mantegna and Stanley JOI , analy- 
ses of price distribution variations of leading stock mar- 
kets |lllll2lll3lll4j and individual companies |15II16| have 
been reported; all of them showing Pareto tails, with a ~ 3 
for the stock market case. 

In section [7| of this work, we analyze the distribution of 
daily logarithmic differences of the Mexican IPC stock in- 
dex, defined as S{t) := log Z{t + 1) - logZ(i); for IPC 
values of Z{t) recorded during an almost 10 years period, 
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from April 14, 1990 until December 31, 2002. 
A goodness-of-fit test based on the Empirical Distribution 
Function (EDF) is introduced in sections |5]to0| Section^ 
explains the test procedures and sectional shows results of 
a Monte Carlo study used to investigate the small sample 
distribution of the test statistics and the speed of conver- 
gence to their asymptotic distribution. 

2 Empirical Distribution Function (EDF). 

Let Yi , . . . , y„ be a random sample from an absolutely 
continuous distribution F and suppose that we are inter- 
ested in testing the null hypothesis that the sample was 
drawn from the distribution 

(1) 

with support on < 7 < 2/ ior a > 0. 

The distribution (^| is known as the Power-law distribu- 
tion. A test of fit can be based on measures of discrepancy 
between the empirical distribution function Fn and the 
hypothesized distribution F. Such test statistics are re- 
ferred to as empirical distribution function based statistics 
or simply EDF-statistics. Here we will consider statistics 
within the class 

/oo 
[F„-Ff^dF (2) 
-00 

When ~ 1, Qn is known as the Cramer von-Mises 
statistic and, for tl;{i^) = {[F(i/)] [1 - F{iy)]y\ it is 
known as the Anderson-Darling statistic. 
The test are based on the quantities Zi = F{Yi; 7, a), the 
Probability Integral Transformation which, under the null 
hypothesis produces observations uniformly distributed on 
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(0, 1). To obtain computational formulas for the test statis- 
tics, the expression in can be written in terms of the 
observed discrepancy between the empirical distribution 
function calculated from the transformed observations and 
the uniform cumulative distribution function; i.e., 



n / [F:{z)~zf4>'{z)dz 



(3) 



where ip'{z) — 1 for W'^ and i^'{z) = [z{l — z)] for the 
Anderson-Darling statistic . 

Computational formulas for these statistics involve the or- 
dered sample values < . . . Zi^^y 



^[Z(,)-(2z-l)/(2n)]' + l/(12n) (4) 



A' 



-(l/n)^(2z- 

i=l 



1) [log%)-|-log{l-Z, 



(n-i+l)]] 

(5) 



3 Estimation 

Given observed values yi , . . . , j/„ of a random sample from 
the distribution ( , the log-hkelihood is 



A(a, — n log a + na log 7 — (a -I- 1) log i/i (6) 



When 7 is known, the maximum-likelihood estimator of 
the parameter a is 



1 " 

- Vlog (yj/7) 



(7) 



If 7 is unknown, we use an approach proposed by jl7| 
substituting into to obtain the profile log-likelihood 
for 7; namely 



A* (7) = nln{n) — nln 



' 111(7) + ^^ny. 



i=l 



(8) 



It can be seen that the above function is increasing in 
the range < 7 < exp(^"^^ log j/i/n). In fact, since 
dX*{j)/d'y — na/7, the derivative is positive in it's range 
of admissible values. Thus, the maximum-profile-likelihood 
estimator for 7 is 7 = 2/(1)7 the minimum sample value. 

When 7i — 7 ,i = l,...,n, is unknown, the esti- 
mate 7 = Y(i) is super-efficient in the sense that it's vari- 
ance tends to zero faster than 1/n. Using this estimate, 
we "loose" one sample observation and there are compu- 
tational problems for calculating A^. Since, in this case. 



Table 1. Asymptotic percentage points of W and A statis- 
tics 



Both 














Parameters 


0.250 


0.15 


0.10 


0.05 


0.025 


0.010 


known 
















0.209 


0.284 


0.347 


0.461 


0.581 


0.743 




1.248 


1.610 


1.933 


2.492 


3.070 


3.880 


a unknown 


0.25 


0.15 


0.10 


0.05 


0.025 


0.010 


w'-' 


0.116 


0.148 


0.175 


0.222 


0.271 


0.338 




0.736 


0.916 


1.062 


1.321 


1.591 


1.959 



= 0, we suggest to estimate the parameter 7 by find- 
ing the value 7, of 7, which satisfies 



7- y(i) 



1 - {naij}} ^ 







(9) 



where a^j) is defined by ([71). Thus, starting with 7 = 
we search for the solution over the interval (0,?/(i)). This 
method of estimation does not seem to have a significant 
effect over the sampling distributions of the test statistics 
as it will be indicated from the results of the section 



4 Asymptotic distributions 

EDF asymptotic distribution statistics was obtained ap- 
plying the theory in The process y/n{Fn{x) — F{x)} 
evaluated a,tt = F{x), converges weakly to {Y{t),t £ (0, 1)}, 
a Gaussian process with zero mean and covariance func- 
tion p{s,t) which depends on the parameters estimated 
and F. The statistics W"^ and A"^ are asymptotically func- 
tions of y(t); namely, ^ Y^{t)dt, A^ ^ a^{t)dt 
, where a(t) = —^^=. Let n*(sA) denote the covariance 

^ ' ^t(l-t) f \ w 

function for a given statistic. The limiting distribution of 
the test statistic is that of ^jVj, where 1^1, j/2, . . . are 



independent X[i) random variables and Ai, A2, . 
eigenvalues of the integral equation 



p*is,t)f,is)ds^Xjf,it) 



are the 



(10) 



When the parameters are known, the covariance function 
is given by p(s, t) — min{s, t) — st. When 7 is known, and 
a is estimated using (0), the covariance function for the 
limiting process becomes p*{s,t) = p{s,t) — (1 — s)(l — 
t) log(l — s) log(l — t) which corresponds to the same co- 
variance function as the resulting one when testing fit to 
the exponential distribution for unknown a, so the asymp- 
totic distribution of the test statistics is the same for test- 
ing exponentially or fit to the Power-law distribution with 
a unknown. Tables 4.2 and 4.11 in ^S], give selected per- 
centage points for the case of known parameters, and for 
testing exponentially with unknown a, respectively. They 
are summarized in table ^for quick reference. 
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5 Test procedures 

1. Given the ordered sample values < ... < y(n): 
the test statistics are computed from the values = 

; a, 7) where the values of the parameters a arrd/or 
7 can be replaced by their estimates if they are not 
known. 

2. Compute the value of the test statistic using ([SJ 
or W"^ from 

3. Refer to table ^for the appropriate case and signifi- 
cance level. 

4. If the value of the test statistic exceeds that from the 
table, reject the Power-law model at the corresponding 
significance level. 

When 7 is known, the distribution of X = \og{y /^) is ex- 
ponential with parameter a; i.e., Fx{x) = 1 — exp{—ax)^ 
and the problem of testing fit to the Power-law distribu- 
tion is equivalent to that of testing the null hypothesis 
that the transformed observations xi, . . . , a:„ were drawn 
from an exponential distribution. If Yi , . . . , y„ is a sam- 
ple of n independent values from Power-law distributions 
with parameters 71 , . . . , 7„ and the same value of the pa- 
rameter a, a test of fit can be carried out by transforming 
Xi = log(yi/7i) and, again, test that the transformed val- 
ues come from an exponential population. It is important 
to remark that this procedure is computationally equiva- 
lent to the procedure described in the previous paragraph, 
due to the fact that Fx{xi) = Fyiyi)- 



6 Monte Carlo study 

A simulation study was conducted to investigate the speed 
of convergence of the empirical percentage points of the 
test statistics to their asymptotic values, five thousand 
samples of size n = 20(200)20 were simulated. In the sim- 
ulation, values of 7 and a ranging from 1/4 to 10 were used 
to verify that there was no effect of the parameter values 
on the speed of convergence; the results showed little or 
no effect at all. The empirical percentage points presented 
in tables 2 and 3 show typical results obtained for 7=1 
and a = 5. 

Table 2 shows the empirical percentage points for the test- 
statistics using the known value of 7 and estimating a 
using ( . These results clearly suggest that the asymp- 
totic percentage points can be used even for small values 
of n. When the parameter 7 is estimated using (inj, the 
speed of convergence of the empirical percentage points, 
to those corresponding to the asymptotic distribution, is 
slower. As it is shown in table 3, it is recommended the 
use the Monte Carlo percentage points for n < 100. 
It is worth to note that the results in table 3 are very 
close to those reported in table 4.15 from JS| for testing 
exponentially with origin and scale parameters unknown, 
which is referred as an external check of these results. 



Table 2. Empirical percentage points of and A^: 7 known 









Significance level 






n 


0.250 


0.15 


0.10 


0.05 


0.025 


0.010 


20 


0.116 


0.147 


0.174 


0.223 


0.270 


0.328 


40 


0.117 


0.151 


0.177 


0.223 


0.266 


0.324 


60 


0.114 


0.147 


0.174 


0.220 


0.271 


0.337 


80 


0.116 


0.147 


0.173 


0.219 


0.272 


0.335 


100 


0.116 


0.148 


0.174 


0.222 


0.271 


0.336 


00 


0.116 


0.148 


0.175 


0.222 


0.271 


0.338 


A' 






Significance level 






n 


0.250 


0.15 


0.10 


0.05 


0.025 


0.010 


20 


.728 


.900 


1.046 


1.302 


1.565 


1.893 


40 


.725 


.903 


1.058 


1.316 


1.587 


1.918 


60 


.737 


.916 


1.054 


1.313 


1.604 


1.940 


80 


.727 


.899 


1.042 


1.274 


1.588 


1.944 


100 


.737 


.920 


1.066 


1.305 


1.552 


1.877 


00 


0.736 


0.916 


1.062 


1.321 


1.591 


1.959 



Table 3. Empirical percentage points of and A^: 7 esti- 
mated using (|UJ 









Significance level 






n 


0.250 


0.15 


0.10 


0.05 


0.025 


0.010 


20 


.106 


.133 


.157 


.198 


.241 


.298 


40 


.112 


.145 


.171 


.212 


.258 


.318 


60 


.112 


.143 


.172 


.219 


.269 


.325 


80 


.114 


.145 


.170 


.214 


.276 


.335 


100 


.114 


.147 


.176 


.224 


.273 


.319 


00 


0.116 


0.148 


0.175 


0.222 


0.271 


0.338 


A' 






Significance level 






n 


0.250 


0.15 


0.10 


0.05 


0.025 


0.010 


20 


0.615 


0.762 


0.880 


1.091 


1.321 


1.602 


40 


0.662 


0.825 


0.955 


1.192 


1.428 


1.710 


60 


0.675 


0.842 


0.973 


1.216 


1.498 


1.826 


80 


0.685 


0.862 


0.994 


1.236 


1.502 


1.856 


100 


0.693 


0.866 


1.005 


1.240 


1.523 


1.814 


00 


0.736 


0.916 


1.062 


1.321 


1.591 


1.959 





7 Application to the analysis of stock price 
variations 

The data analyzed consist of two data sets; the first corre- 
sponding to the 126 largest differences di, and the second 
one corresponding to the absolute values of the 144 small- 
est differences d*, from the series S{t) = log2'(t -I- 1) — 
logZ(t); where Z{t) denotes daily values of the Mexican 
IPC stock index, between the April 14, 1990 and Decem- 
ber 31, 2002. 

Following Gopikrishnan et al. |11II12| . Liu et al. |13|. and 

Mantegna and Stanley jH] , we wish to test statistically, the 
null hypothesis that these extreme observations are con- 
sistent with the Power-law distribution. 
For the data set di, . . . , (ii26, the estimates of the param- 
eters are 7 — 0.0313 and a — 3.31; for which the values 
of the test statistics were found to be A^ = 0.5778 and 
= 0.0995. 

The second data set, consisting of the absolute values 
d*, . . . , ^244 of the smallest differences, we find 7 = 0.0275, 
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Fig. 1. Empirical (dash) and Fitted (solid) CDF for differences 
di 




0.05 0.10 0.15 



Fig. 2. Empirical (dash) and Fitted (solid) CDF for differences 



a = 3.05; the values of the test statistics were — 0.7142 
and = 0.1264. 

Referring to the asymptotic percentage points in table 3, 
we find that, in both cases, the p— value is greater than 
0.25, indicating a very good fit to the power-law distribu- 
tion. Figures 1 and 2 show the fitted and empirical cumu- 
lative distribution fimctions for both data sets. 



8 Conclusions 

In this work, the use of the Anderson-Darling and Cramer 
von-Mises goodness-of-fit statistics have been presented 
for the case of the Power-law distribution and the asymp- 
totic distributions of these test statistics have been ob- 
tained. 

Following the asymptotic distribution of statistics 
based on the empirical distribution function does not de- 
pend on location and scale parameters, but it might de- 
pend on the value of a shape parameter; here it has been 
shown that for the Power-law distribution, these asymp- 
totic distributions do not depend on the particular value 
of shape parameter a by calculation of the covariance 
function of the corresponding limiting process on which 



they are based, relating it to the case of the exponential 
distribution. When the threshold parameter 7 is known, 
simulations results suggest that the asymptotic percent- 
age points of the Anderson-Darling and Cramer von-Mises 
statistics can be used with good accuracy even for small 
n. For the case of 7 unknown, an estimator was proposed. 
From Monte Carlo results, we conclude that such an es- 
timator is useful for goodness-of-fit purposes in the sense 
that it allows the calculation of the test statistics, pre- 
serving the asymptotic distribution, although the speed 
of convergence appears to be slower in this case. 
The proposed test was shown to be useful in analyzing 
stock price variations, where it is required a significance 
test of fit for the power-law distribution. The results ob- 
tained here for describing the changes in the Mexican 
stock exchange index IPC, are consistent with previous 
studies where the power-law distribution with shape pa- 
rameter a ~ 3 has been proposed. 

A.R.H.M wishes to thank Conacyt-Mexico for financial sup- 
port provided under Grant No. I35646-E/3399-E/5617. 

References 

1. V. Pareto, Course d'Economie Politique, [Laussane and 
Paris, 1897). 

2. L. Bachelier, Ph.D. Thesis, Theorie de la Speculation, An- 
nales Scientifiques de I'Ecole Normale Superieure III-17, 
(1900) 21-86. 

3. Bertrand M. Roehner, Patterns of Speculation. A Study in 
Observational Econophysics (Cambridge University Press, 
United Kingdom 2002) 25-35. 

4. Proceedings of the Workshop "Empirical Science of Fi- 
nancial Fluctuations. The Advent of Econophysics", edited 
by Hideki Takayasu (Workshop Organized by Nihon Keizai 
Shimbun, Tokyo 2000). 

5. Jean-Philippe Bouchaud, Physica A 313,(2002) 238-251. 

6. H. E. Stanley et al, Physica A 269,(1999) 156-159. 

7. Dietrich Stauffer, Int. J. Mod.Phys. C 11 (2000) 1081-1087. 

8. R.N. Mantegna, and H.E. Stanley, An Introduction to 
Econophysics (Cambridge University Press, United King- 
dom, 2000). 

9. B. B. Mandelbrot, J. Business 36, (1963) 394-419. 

10. R. N. Mantegna, H. E. Stanley, Nature 376 (1995) 46-49. 

11. P. Gopikrishnan et al, Eur. Phys. J.B 3, (1998) 139-140. 

12. P. Gopikrishnan et al, Phys.Rev. E 60, (1999) 5305-5316. 

13. Y. Liu, P. Gopikrishnan et al, Phys.Rev. E 60,2, (1999) 
6517-6529. 

14. T. Lux, Applied Financial Economics 6, (1996) 463-475 

15. Liu, Y. et al.. Physical Review E, 60, 2, (1999) 1390-1400. 

16. V. Plerou, P. Gopikrishnan, L. A. Amaral, M. Meyer, H. 
E. Stanley, Phys.Rev. E 60,6, (1999) 6519-6529. 

17. Lockhart, R.A. and Stephens, M.A., Estimation and Tests 
of fit for the three-parameter Weihull distribution.Keseaxch. 
Report 92-10, (Department of Mathematics and Statistics, 
Simon Eraser University 1992). 

18. D'Agostino, R.B. and Stephens, M. A., Goodness-of-fit 
Techniques , (Marcel Dekker, New York 1986). 

19. Durbin, J., Distribution theory for tests based on the sam- 
ple distribution function, Regional Conference Series in Ap- 
plied Mathematics, 9 (Philadelphia, SIAM, 1973). 



